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© Processor array. 



© A processor array comprises a number of inter- 
connected processing elements (PE). Each process- 
ing element (PE) includes row and column select 
inputs connected via respective row and column 
select lines to a control unit (MCU) for the array. The 
row select inputs in each row and the column select 
inputs in each column are connected in common. 
The processing elements (PE) receive broadcast row 
and column data over the respective row and col- 
r*umn select lines. ^ 
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PROCESSOR 



The present invention relates to processor ar- 
rays for use in parallel processing computer sys- 
tems. An example of such an array is disclosed 
and claimed in GB-A-1445714. 

Typically such a processor array comprises a 5 
number of processing elements arranged in rows 
and columns. Each processing element, other than 
those on the very edge of the array, is connected 
to Its four nearest neighbours in the array in the 
North, South, East, and West directions to permit w 
transfer of data between them. Each processing 
element is also connected to data buses asso- 
ciated with its respective row and column so that in 
addition to the transfer of data over nearest neigh- 
bour connections data can be broadcast to entire 75 
rows or columns. 

The processing elements are connected to a 
control unit which controls the addressing of the 
array. In particular each element has a column 
select input and a row select input connected via 20 
respective row and column select lines to the con- 
trol unit. These inputs when TRUE enable the store 
associated with a respective processing element. 
Thus to read out the values from the store of a 
particular row of elements, for example, all of the 25 
select column (SC) inputs are set TRUE and the 
select row (SR) inputs in the selected row are set 
TRUE but are FALSE elsewhere. Then in the 
unique row of processing elements the data output 
is equal to the input from the associated store and 30 
can be read out over data output lines connected 
between every element in the row and an AND 
logic unit associated with the row to provide the 
required row output. 

According to the present invention, a processor as 
array comprising a number of interconnected pro- 
cessing elements, each processing element having 
row select and column select inputs connected via 
respective row and column select lines to a control 
unit is characterised in that the row and column 40 
select inputs in each row or column are connected 
in common and the processing elements are ar- 
ranged to receive broadcast row and column data 
over their respective row and column select lines. 

As described above, it is known to provide in a 45 
processor array in addition to paths for the trans- 
mission of data between individual elements means 
to broadcast data to an entire row or column. This 
requires data paths connected in common to ail the 
elements in a given row or column. In known so 
systems dedicated broadcast data buses have 
been provided for this purpose. The present in- 
ventors have found however that the processing 
elements can be arranged so that the same lines 
which carry the row and column select signals can 



ARRAY 

also be used to carry the broadcast row and col- 
umn data, thereby considerably reducing the wiring 
complexity of the array as a whole. 

Preferably each processing element includes 
an activity control unit and the activity control unit 
is arranged to receive activity control signals from 
the row and column select inputs. 

It is known to have activity control units asso- 
ciated with each processing element of the array. 
In simple terms these activity control units function 
to turn their respective processing elements ON or 
OFF so that, for example, when data is broadcast 
to the array as a whole certain elements can be 
masked by turning their associated activity control 
units OFF. As described below, in the present 
invention the activity unit is arranged so that its 
control inputs are received via the same data paths 
that are used for column or row selection and for 
the broadcasting of data. This enables a further 
reduction in the wiring complexity of the process- 
ing element chip. 

Preferably the processor array includes mul- 
tiplexers connected between the control unit and 
the row and column select lines and arranged to 
multiplex row or column select signals and broad- 
cast row or column data onto the lines. Preferably 
the processor array further includes an address 
decoder arranged to receive row or column ad- 
dresses from the control unit and to transmit to the 
multiplexers appropriate row or column select sig- 
nals derived from the row or column addresses. 

An array in accordance with the present inven- 
tion is now described in detail with reference to 
accompanying drawings in which; 

Figure 1 is a block diagram showing a sys- 
tem incorporating a processor array in accordance 
with the present invention; 

Figure 2 is a block diagram showing one of 
the processing elements of Figure 1 ; 

Figure 3 is a detailed schematic of such a 
processing element; 

Figure 4 is a schematic of the decoder of 
Figure 1 ; 

Figure 5 is a schematic of the multiplexer of 
Figure 1; 

Figure 6 is a diagram showing wiring routes 
for row and column responses of the array to the 
MCU; and 

Figure 7 is a diagram showing the wiring 
routes for the transmission of data and address 
signals to the array. 

A parallel processing computer system com- 
prises an array 1 connected via a control unit MCU 
to a host computer 2. The array 1 is formed of 
single bit processing elements PE arranged in rows 



and columns. Each processing element PE, with 
the exception of those on the edge of the array, is 
connected to its four nearest neighbours. 

In practice although the topology and connec- 
tivity of the array is as described the physical 
arrangement of the processing elements PE may 
be other than in a flat square array. For example 
the array may be folded back on itself to give an 
arrangement in which there are two or more layers 
of processing elements PE arranged one above 
another. In. the preferred example the array com- 
prises 64 processing elements PE arranged 8x8. 
In practice a number of such arrays may be joined 
together to form a larger array of overall size 64 x 
64. 

Each processing element PE has associated 
with it a local memory from which data is read and 
to which data is written. There is also within the 
processing element PE an arithmetic unit 3 com- 
prising an adder and associated operand registers. 
The values written to the local memory may be the 
results of a calculation within the arithmetic unit 3 
or alternatively may be values received directly 
from the nearest neighbour connections (N.S.E.W) 
or may be broadcast row or column data received 
via the row/column select lines. The data paths 
within each processing element PE are shown in 
Figure 2. Each processing element PE is con- 
nected to row/column select lines. All the row se- 
lect lines in a given row and all the column select 
lines in a given column are connected in common. 
The lines are formed as buses extending along 
each row or column with branches extending to 
each processing element PE. 

In use the control unit MCU distributes address 
and row/column data in the manner described be- 
low and also controls the functioning of the array 
by broadcasting control signals to the processing 
elements PE, to the decoder DEC and to the 
multiplexers MUX. Three bit addresses are output 
by the control unit MCU. The decoder DEC shown 
in detail in Figure 4, decodes these addresses to 
produce appropriate row or column select signals 
for the processing elements PE. The first stage of 
the decoder produces a unique 0 on the row or 
column select line of the row or column that is to 
be selected. The second stage provides the option 
of an alternative method of addressing the array in 
which all rows or columns are selected. In any 
particular instruction the MCU either broadcasts 
data to the array, or receives data from the array, 
or neither of these. Hence at the boundary of the 
array (the dotted line in Figure 1) the data paths 
between the MCU and the array are provided by 
single bidirectional buses which at different times 
carry data both to and from the array. 

In addition to carrying row/column select sig- 
nals from the decoder DEC the row and column 



select lines are also used to carry broadcast row or 
column data. This data is multiplexed with the 
row/column select signals by the multiplexer MUX. 
Figure 5 shows in detail the multiplexing circuit for 

5 row data: this is duplicated for column data. The 
main element of this circuit is a number of 2:1 
multiplexers which receive column select signals 
from the decoder DEC and broadcast row or col- 
umn data from the control unit MCU. The multiplex- 

10 ers output the row and column select signals and 
the broadcast row and column data in appropriate 
time slots on their respective row and column se- 
lect lines. In the preferred example both TRUE 
("ROWVCOL") and complement ("ROWB", 

15 "COLB") select lines are provided for each row and 
column. The multiplexer MUX is arranged to pro- 
vide corresponding TRUE and complement" out- 
puts. The 2:1 multiplexers are followed by further 
multiplexers HMUX used for hold and serial diag- 

20 nostics. The ROW select lines ROWO. ROW1, 
ROW2... have a unique zero or are all set to zero, 
or carry row data. 

The row and column select lines are arranged 
as shown in Figure 7 so that each bit of the 

25 broadcast row data is connected to every process- 
ing element PE in a given row and similarly for 
columns and column data. A bit of row data is 
therefore used to identify a particular row or to 
broadcast the same pattern of bits to every coi- 

30 umn. Similarly column data identifies a particular 
column or broadcasts a pattern to every row. 

Using a decoder, multiplexer and processing 
elements PE of the type described it is possible to 
execute three principal types of instruction. 

35 

1, DATA INPUT TO THE PE 

In this case the data multiplexer selects an 

40 input that may be loaded into PE registers or 
combined with existing contents of PE registers in 
various ways before being written to the registers. 
The options are:- 

(a) Signals ROWB and COLB are used near 

45 the bottom of the data multiplexer as shown in 
Figure 3, If both are true then the other select 
signals for the multiplexer determine which particu- 
lar signal is input to the PE. For example, signals 
CTL10 through CTL13 select a value from one of 

so four nearest neighbour PE's (N.E.S or W) and 
signal CTL6 selects the memory input bit Ml of the 
same processing element as operand, in which 
case the processing element reads the data for 
processing on its own local store. 

55 (b) If COLB is TRUE, ROWB contains data 

(inverted) and all the other data multiplexer select 
signals are FALSE then the input to the PE is the 
data broadcast from the MCU on the row data 



3 



5 



EP 0 375 401 A1 



6 



lines. 

(c) Similarly if ROWB is TRUE, COLB con- 
tains data (inverted) and all the other data mul- 
tiplexer select signals are FALSE then the input to 
the PE is the data broadcast from the MCU on the 
column data lines. 

Thus by transmitting the appropriate signals on 
the row and column select lines we can select row 
data, column data, or data from the local store as 
the input to the processing element PE. 



2. RESPONSE OUTPUT FROM THE PE 

In this case the data multiplexer selects data 
that is to take part in the response. The selected 
value is output from the individual PE from the 
output shown near the bottom right of figure 3 and 
combined with the other PE outputs. Dedicated 
lines are provided for the response outputs as 
shown in figure 6. The outputs are combined at 
AND units provided along two edges of the array 1 
and the result of either the row AND or the column 
AND is returned to the MCU. The options are :- 

(a) Signals ROWB and COLB are both TRUE 
and the other select signals for the multiplexer 
choose a particular signal to use as the PE re- 
sponse. For example, CTL6 selects the memory 
input bit (Ml) of the same PE. The outputs of all the 
PE's are ANDed together by rows or by columns 
dependent on the instruction. 

(b) If COLB is TRUE and ROWB contains a 
unique 1, then the response output is a function 
both of the row data and what is selected by the 
other multiplexer control signals. The effect is that 
in the selected row, the memory data Ml, for exam- 
ple, is output on the response, and in all other rows 
the output is TRUE regardless of the value of Ml. 
When the outputs of all the PE's are combined by 
ANDing all the rows together, the result is the 
same as a single row of Ml values, the row con- 
cerned having been identified by the position of the 
unique bit in the row decode. 

(c) Similarly if ROWB is TRUE, and COLB 
contains a unique 1 , a selected column value may 
be returned to the MCU. 



3. DATA OUTPUT FROM MEMORY 

The output to memory which is a single bit for 
each PE is shown as MO in figure 2. The output is 
gated by a memory write select unit MWS which in 
turn receives an input from the activity control. The 
value output therefore depends on the activity se- 
lect logic, the output of which is referred to simply 
as the "activity", and on the MWS logic. The 
function of the MWS logic is as follows: when an 



activity is TRUE for a particular PE, then the output 
of the PE, that is the result of the sum function, is 
selected to be written to memory; when activity is 
FALSE the old memory contents which were pre- 

5 viously captured in the S register are selected and 
re-written to memory so that there is no change in 
memory contents. 

As shown in Figure 3 the input to the activity 
control is taken from the row/column select lines 

w and so uses row or column data. The options are:- 

(a) If CTL2 and CTL3 are both FALSE then 
the activity options are specified by CTLO and 
CTL1 the available options :- 

i. Activity = "All" i.e. TRUE in every PE 
is ii. Activity = A, where A is the value of 

the A-register in the same PE. 

Hi. Activity = complement of A 
iv. Activity = "None" i.e. FALSE in every 
PE. This option is not normally useful but is con- 
20 cerned with outputing information other than the old 
memory data onto MO. 

In these cases the data multiplexer selects data 
to be operated upon by the sum function to create 
the memory write data. This may include the 
25 broadcast of data on ROWB or COLB. 

(b) If CTL2 is TRUE, CTL3 is FALSE and 
ROW is a unique 0, then the activity options above 
are ail ANDed with the selection of the row speci- 
fied by the position of the unique 0. Again the data 

30 multiplexer may select data to be operated upon 
by the PE including the selection of data on the 
COL signals. In the unselected rows, signal ROWB 
is FALSE thus causing the PE input to be FALSE, 
but this is unimportant since in such PEs the old 

35 memory contents are re-written regardless of the 
PE data input. 

(c) Similarly if CTL3 is TRUE CTL2 is 
FALSE and COL is a unique 0, then the activity is 
TRUE only in a selected column, 

40 Thus in the arrangement described above the 
same row data paths (ignoring the fact that there 
are TRUE and complement versions) are used for 
three distinct purposes at different times: broadcast 
of data; selection of a row for response output; and 

45 selection of a row for activity controlled write. The 
column data paths have corresponding functions 
with orthogonal orientation. 

so Claims 

1. A processor array comprising a number of 
interconnected processing elements (PE), each 
processing element (PE) having row select and 
55 column select inputs connected via respective row 
and column select lines to a control unit (MCU), 
characterised in that the row and column select 
inputs in each row or column are connected in 



common and the processing elements (PE) are 
arranged to receive broadcast row and column data 
over their respective row and column select lines. 

2. A processor array according to Claim 1, in 
which each processing element (PE) includes an 5 
activity control unit and the activity control unit is 
connected to the respective row and column select 
inputs and arranged to receive activity control sig- 
nals from the row and column select inputs. 

3. A processor array according to claim 1 or 2, w 
including multiplexers (MUX) connected between 

the control unit and the row and column select 
lines and arranged to multiplex row or column 
select signals and broadcast row or column data 
onto the lines. '5 

4. A processor array according to claim 3, 
including an address decoder (DEC) arranged to 
receive row or column addresses from the control 
unit (MCU) and to transmit to the multiplexers 
(MUX) appropriate row or column select signals 20 
derived from the row or column addresses. 

5. A processor array according to anyone of 
the preceding claims, in which each processing 
element (PE) includes on its input side a data 
multiplexer (DATA MUX) arranged to receive data 25 
from neighbouring processing elements and from 

the respective row and column select inputs. 
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